Goto

Collaborating Authors

 balanced distribution




Transformers Struggle to Learn to Search

Saparov, Abulhair, Pawar, Srushti, Pimpalgaonkar, Shreyas, Joshi, Nitish, Pang, Richard Yuanzhe, Padmakumar, Vishakh, Kazemi, Seyed Mehran, Kim, Najoung, He, He

arXiv.org Artificial Intelligence

Search is an ability foundational in many important tasks, and recent studies have shown that large language models (LLMs) struggle to perform search robustly. It is unknown whether this inability is due to a lack of data, insufficient model parameters, or fundamental limitations of the transformer architecture. In this work, we use the foundational graph connectivity problem as a testbed to generate effectively limitless high-coverage data to train small transformers and test whether they can learn to perform search. We find that, when given the right training distribution, the transformer is able to learn to search. We analyze the algorithm that the transformer has learned through a novel mechanistic interpretability technique that enables us to extract the computation graph from the trained model. We find that for each vertex in the input graph, transformers compute the set of vertices reachable from that vertex. Each layer then progressively expands these sets, allowing the model to search over a number of vertices exponential in the number of layers. However, we find that as the input graph size increases, the transformer has greater difficulty in learning the task. This difficulty is not resolved even as the number of parameters is increased, suggesting that increasing model scale will not lead to robust search abilities. We also find that performing search in-context (i.e., chain-of-thought) does not resolve this inability to learn to search on larger graphs.


Align, Distill, and Augment Everything All at Once for Imbalanced Semi-Supervised Learning

Aimar, Emanuel Sanchez, Helgesen, Hannah, Felsberg, Michael, Kuhlmann, Marco

arXiv.org Artificial Intelligence

Addressing the class imbalance in long-tailed semi-supervised learning (SSL) poses a few significant challenges stemming from differences between the marginal distributions of unlabeled data and the labeled data, as the former is often unknown and potentially distinct from the latter. The first challenge is to avoid biasing the pseudo-labels towards an incorrect distribution, such as that of the labeled data or a balanced distribution, during training. However, we still wish to ensure a balanced unlabeled distribution during inference, which is the second challenge. To address both of these challenges, we propose a three-faceted solution: a flexible distribution alignment that progressively aligns the classifier from a dynamically estimated unlabeled prior towards a balanced distribution, a soft consistency regularization that exploits underconfident pseudo-labels discarded by threshold-based methods, and a schema for expanding the unlabeled set with input data from the labeled partition. This last facet comes in as a response to the commonly-overlooked fact that disjoint partitions of labeled and unlabeled data prevent the benefits of strong data augmentation on the labeled set. Our overall framework requires no additional training cycles, so it will align, distill, and augment everything all at once (ADALLO). Our extensive evaluations of ADALLO on imbalanced SSL benchmark datasets, including CIFAR10-LT, CIFAR100-LT, and STL10-LT with varying degrees of class imbalance, amount of labeled data, and distribution mismatch, demonstrate significant improvements in the performance of imbalanced SSL under large distribution mismatch, as well as competitiveness with state-of-the-art methods when the labeled and unlabeled data follow the same marginal distribution. Our code will be released upon paper acceptance.


Causal Balancing for Domain Generalization

Wang, Xinyi, Saxon, Michael, Li, Jiachen, Zhang, Hongyang, Zhang, Kun, Wang, William Yang

arXiv.org Artificial Intelligence

While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underlying causal mechanisms for the data generation process. We argue that the Bayes optimal classifiers trained on such balanced distribution are minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed data generation process, when utilizing enough train environments. Experiments are conducted on DomainBed, demonstrating empirically that our method obtains the best performance across 20 baselines reported on the benchmark.


Finding and Fixing Spurious Patterns with Explanations

Plumb, Gregory, Ribeiro, Marco Tulio, Talwalkar, Ameet

arXiv.org Artificial Intelligence

Image classifiers often use spurious patterns, such as "relying on the presence of a person to detect a tennis racket, which do not generalize. In this work, we present an end-to-end pipeline for identifying and mitigating spurious patterns for such models, under the assumption that we have access to pixel-wise object-annotations. We start by identifying patterns such as "the model's prediction for tennis racket changes 63% of the time if we hide the people." Then, if a pattern is spurious, we mitigate it via a novel form of data augmentation. We demonstrate that our method identifies a diverse set of spurious patterns and that it mitigates them by producing a model that is both more accurate on a distribution where the spurious pattern is not helpful and more robust to distribution shift.


Robot-analysts make BETTER stock recommendations than human investors, study finds

Daily Mail - Science & tech

Robots are said to take over some 200,000 jobs on Wall Street over the next decade and a new study suggests this prediction could soon become a reality. Following the analysis of 76,000 reports from seven different robo-analysis firms, researchers determined that the technology is able to make recommendations similar to their human counterparts - but faster and more accurately. Because the automation is less subject to behavioral biases and conflicts of interest, it can produce a more balanced distribution of ratings, which includes investment's risk and suggestions whether to hold, sell or purchase. Looking at the robot portfolios, the study found their buy recommendations earned returns from 6.4 percent to 6.9 percent, while those of its human counterparts only ranged from 1.2 percent to 1.7 percent. Although robo-analysis sounds like it could weed out human investors, researchers believe that as long as there are people that need human interaction, 'the buy-side, the sell-side will still be around.' Because the automation is less subject to behavioral biases and conflicts of interest, it can produce a more balanced distribution of ratings, which includes investment's risk and suggestions whether to hold, sell or purchase (stock photo) The study was conducted by a team at Indiana University, who wrote: 'Our study provides the first comprehensive analysis of the properties of investment recommendations generated by'Robo-Analysts,' which are human-analyst assisted computer programs conducting automated research analysis.


Mario Schlechter on LinkedIn: "Yesterday we showed you that we embrace the #future #mobility. Today I would like to invite you to a free training on Operide, our micro-mobility fleet management application based on #ai! So you can make sure that you provide a more balanced distribution of #eBikes or even #eScooters! Just because #itsyourcity! Join our free training!"

#artificialintelligence

Yesterday we showed you that we embrace the #future #mobility. Today I would like to invite you to a free training on Operide, our micro-mobility fleet management application based on #ai! So you can make sure that you provide a more balanced distribution of #eBikes or even #eScooters! Operide, our #ai driven shared micro-mobility fleet management application, optimises the rebalancing process so that more assets (bikes/scooters) are available to the end-user.